Performance impact of run queue organization and synchronization on large-scale NUMA multiprocessor systems
نویسندگان
چکیده
The goal of this paper is to study the impact of run queue organization on the performance of synchronization methods in multiprocessor systems. Two run queue organizations are considered: distributed and hierarchical organizations. The performance impact of spinning and blocking synchronization methods on these two run queue organizations is studied. We use two canonical workload types that require task synchronization: lock accessing and barrier synchronization workloads. The results presented here show that, when fine grain synchronization is required, the distributed organization is better. However, for large granularity tasks, the performance of the distributed organization is unacceptable and the hierarchical organization should be used. Note that the distributed organization is embedded into the hierarchical organization. Thus, for coarse granularity parallel applications, the hierarchical organization with its load sharing feature can be used; for fine-granularity parallel applications, the hierarchy of queues can be circumvented and the round robin task assignment can be done on processor local queues as in the distributed organization. Therefore, the hierarchical organization is useful in general-purpose large-scale shared-memory multiprocessors.
منابع مشابه
Reducing Contention for Run Queue in Shared-Memory Multiprocessor Systems
Performance of parallel processing systems is sensitive to various hardware and software overheads and contention for hardware and software resources. Hardware resources such as interconnection network and memory introduce communication contention and memory contention that could seriously impact overall system performance. Software resources include critical data structures maintained by appli...
متن کاملPerformance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor
Non-Uniform Memory Access (NUMA) architectures make it possible to build large-scale shared memory multiprocessor systems in comparison with non-scalable UniformMemory Access (UMA) architectures. Most NUMA multiprocessor operations such as scheduling and synchronizing processes, accessing data from processors to memory models and allocating distributed memory space to di erent processors, are p...
متن کاملReducing Run Queue Contention in Shared Memory Multiprocessors
Feature No single method for mitigating the performance problems of centralized and distributed run queues is entirely successful. A hierarchical run queue succeeds by borrowing the best features of both. P erformance of parallel processing systems, especially large systems, is sensitive to various types of overhead and contention. Performance consequences may be serious when contention occurs ...
متن کاملA Multiprocessor System with Non-Preemptive Earliest-Deadline-First Scheduling Policy: A Performability Study
This paper introduces an analytical method for approximating the performability of a firm realtime system modeled by a multi-server queue. The service discipline in the queue is earliestdeadline- first (EDF), which is an optimal scheduling algorithm. Real-time jobs with exponentially distributed relative deadlines arrive according to a Poisson process. All jobs have deadlines until the end of s...
متن کاملPerformance of Hierarchical Processor Scheduling in Shared-Memory Multiprocessor Systems
Processor scheduling policies can be broadly divided into space-sharing and time-sharing policies. Space-sharing policies partition system processors and each partition is allocated exclusively to a job. In time-sharing policies, processors are temporally shared by jobs (e.g., in a round robin fashion). Space-sharing policies can be either static (processor allocation remains constant during th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Systems Architecture
دوره 43 شماره
صفحات -
تاریخ انتشار 1997